I’ve heard many stories of confusion of tourists about our city names: They want to take the train to “Liège”" from Ghent but there only is one to “Luik”. Or they are driving south towards France and their GPS might tell them to follow the direction “Mons”, but as long as they are on Flemish territory they will see “Bergen”.
Luik/Liège, Bergen/Mons… It is the same city. The first is the Flemish name, the second the French.
But what I didn’t know is how many towns and cities we have that carry two official names.
I found everything I needed on (this website) from the Belgian government.
Starting by loading the packages needed:
#packages for the data exploration
library(tidyverse)
library(readxl)
library(ggplot2)
#packages for the maps
library(sp)
## Warning: package 'sp' was built under R version 3.4.3
library(tmap)
## Warning: package 'tmap' was built under R version 3.4.3
library(viridisLite)
## Warning: package 'viridisLite' was built under R version 3.4.3
library(leaflet)
## Warning: package 'leaflet' was built under R version 3.4.3
#Importing the data
raw_data <- read_excel("TF_SOC_POP_STRUCT_2017_tcm325-283761.xlsx", sheet=1)
#Keeping only the variables needed
data <- raw_data %>%
select(contains("MUNTY"), TX_RGN_DESCR_NL, CD_SEX, TX_NATLTY_NL, TX_CIV_STS_NL, CD_AGE, MS_POPULATION)
colnames(data) <- c("REFNIS", "TownNL", "TownFR", "Region", "Sex", "Nationality", "MaritalStatus", "Age", "Population")
#Translating Region names to English
data$Region <- data$Region %>%
str_replace("Vlaams Gewest", "Flanders") %>%
str_replace("Waals Gewest", "Wallonia") %>%
str_replace("Brussels Hoofdstedelijk Gewest", "Brussels agglomeration")
After importing the data, it contained a lot of administrative data I didn’t need. Additionnally, The data does not contain a total population by town, because it’s divided in demographic subsets. A bit of dplyr filtering showed me that in my home town, there are less than 30 people with the same characteristics as me (female, unmarried, Belgian, age 34) but that’s not really the most interesting.
Using dplyr I created a population table, and immediately added a column to compare Town Names in Flemish and French.
#Creating a dataframe with total population for each town, and adding a column to see whether they have the same name
popdata <- data %>%
group_by(TownNL, TownFR, Region, REFNIS) %>%
summarise(population=sum(Population)) %>%
arrange(desc(population)) %>%
mutate(SameName = TownNL==TownFR) %>%
ungroup()
#Noticing an issue:
popdata%>%
filter(Region=="Flanders") %>%
filter(!SameName) %>%
print(n=11)
## # A tibble: 45 x 6
## TownNL TownFR Region
## <chr> <chr> <chr>
## 1 Antwerpen Anvers Flanders
## 2 Gent Gand Flanders
## 3 Brugge Bruges Flanders
## 4 Leuven Louvain Flanders
## 5 Mechelen Malines Flanders
## 6 Aalst (Aalst) Alost (Alost) Flanders
## 7 Sint-Niklaas (Sint-Niklaas) Saint-Nicolas (Saint-Nicolas) Flanders
## 8 Kortrijk Courtrai Flanders
## 9 Oostende Ostende Flanders
## 10 Roeselare Roulers Flanders
## 11 Beveren (Sint-Niklaas) Beveren (Saint-Nicolas) Flanders
## # ... with 34 more rows, and 3 more variables: REFNIS <chr>,
## # population <dbl>, SameName <lgl>
But luckily I noticed an issue quickly. Some town names were annotated with their district. Beveren is called the same in Flemish or French, but its district gets translated. To get rid of the districts, I cleaned out any word pattern between brackets, and redid the comparison to find out where town names are different.
#Removing the sectors between brackets
popdata$TownNL <- str_replace(popdata$TownNL, pattern="\\s\\(.+\\)", replacement="")
popdata$TownFR <- str_replace(popdata$TownFR, pattern="\\s\\(.+\\)", replacement="")
#Reassessing whether the names are the same, and removing the previous sameName column to avoid confusion
popdata <- popdata %>%
mutate(DiffName = TownNL != TownFR) %>%
select(TownNL, TownFR, DiffName, population, Region, REFNIS)
There are 95 towns/cities with two different official names, which is 16% of the total amount of towns. Contrary to what some people assume, it’s more or less similar in both regions: 13% of Flemish towns have an official French name, 16% of Walloon towns have an official Flemish name on top. Only in Brussels, an official bilingual region, as a much higher percentage of ’double name’s.
#How many have exactly the same name?
sum(popdata$DiffName)
## [1] 95
mean(popdata$DiffName)
## [1] 0.1612903
#by region
summary <- popdata %>%
group_by(Region) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
knitr::kable(summary)
| Region | NTowns | N_SameName | N_DiffName | Prop_SameName | Prop_DiffName |
|---|---|---|---|---|---|
| Brussels agglomeration | 19 | 6 | 13 | 0.32 | 0.68 |
| Flanders | 308 | 269 | 39 | 0.87 | 0.13 |
| Wallonia | 262 | 219 | 43 | 0.84 | 0.16 |
Using tmap I created two first maps: one that shows the general regions in Belgium, and a second comparative one highlighting just the towns that have two official town names.
#Importing SPdataframe for Belgium
data("BE_ADMIN_MUNTY", package="BelgiumMaps.StatBel")
#Merging my 2017 data with the SPdataframe
mapdata <- merge(BE_ADMIN_MUNTY, popdata, by.x = "CD_MUNTY_REFNIS", by.y = "REFNIS")
#Making a file containing only the towns with different names
popdata_DiffName <- popdata %>%
filter(DiffName==TRUE)
mapdataDiffName <- merge(BE_ADMIN_MUNTY, popdata_DiffName, by.x = "CD_MUNTY_REFNIS", by.y = "REFNIS")
#Creating a colour palette
virpalette <- rev(viridis(3))
#Plot different regions
regionplot<- tm_shape(mapdata) +
tm_fill(col="Region", palette=virpalette,
title = "Regions in Belgium")+
tm_polygons(id="TownNL")+
tm_layout(legend.position = c("left", "bottom"))
#Plot to show those with differnet name by region
nameplot <- tm_shape(mapdataDiffName) +
tm_fill(col="Region", palette=virpalette, id="TownNL",
colorNA = "gray90", textNA="Same name",
title = "Different regional town names",legend.position = c("left", "bottom" ),
popup.vars = c("TownNL","TownFR", "population", "Reason"))+
tm_polygons(id="TownNL", "TownFR")+
tm_layout(legend.position = c("left", "bottom"))
tmap_arrange(regionplot, nameplot)
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
A few things to notice: there is a slightly higher concentration of towns with two offical town names around the language border, but it doesn’t realy explain the full picture.
In the above table it was obvious that the Brussels region has a much higher share of towns with two offical names: 68% versus the country average of 16%. Given Brussels’ status as bilingual that should not come as a surprise. I was actually more surprised to realize that there are still 6 that only have their former Flemish name, and some of them like “Ganshoren” isn’t really that easy to pronounce.
#Checking the data on Brussels
popdata %>%
filter(Region=="Brussels agglomeration") %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 19 6 13 0.32 0.68
#List of names for Brussels
popdata %>%
filter(Region=="Brussels agglomeration") %>%
group_by(DiffName) %>%
arrange(desc(DiffName), desc(population))
## # A tibble: 19 x 6
## # Groups: DiffName [2]
## TownNL TownFR DiffName population
## <chr> <chr> <lgl> <dbl>
## 1 Brussel Bruxelles TRUE 176545
## 2 Schaarbeek Schaerbeek TRUE 133042
## 3 Sint-Jans-Molenbeek Molenbeek-Saint-Jean TRUE 96629
## 4 Elsene Ixelles TRUE 86244
## 5 Ukkel Uccle TRUE 82307
## 6 Vorst Forest TRUE 55746
## 7 Sint-Lambrechts-Woluwe Woluwe-Saint-Lambert TRUE 55216
## 8 Sint-Gillis Saint-Gilles TRUE 50471
## 9 Sint-Pieters-Woluwe Woluwe-Saint-Pierre TRUE 41217
## 10 Oudergem Auderghem TRUE 33313
## 11 Sint-Joost-ten-Node Saint-Josse-ten-Noode TRUE 27115
## 12 Watermaal-Bosvoorde Watermael-Boitsfort TRUE 24871
## 13 Sint-Agatha-Berchem Berchem-Sainte-Agathe TRUE 24701
## 14 Anderlecht Anderlecht FALSE 118241
## 15 Jette Jette FALSE 51933
## 16 Etterbeek Etterbeek FALSE 47414
## 17 Evere Evere FALSE 40394
## 18 Ganshoren Ganshoren FALSE 24596
## 19 Koekelberg Koekelberg FALSE 21609
## # ... with 2 more variables: Region <chr>, REFNIS <chr>
#Adding a column to note down the reason for different names
reason_BXL <- popdata %>%
filter(Region=="Brussels agglomeration") %>%
filter(DiffName) %>%
mutate(Reason = "Brussels")
Cities are generally more important and I would have guessed that most of our cities have two official names. By just looking at the difference in average population between towns that have two names (TRUE) and those who don’t, there clearly is a skew towards higher population town. A quick plot in ggplot confirms this to be true: grey shows all the towns in Belgium according to their population size on a logarithmic scale. I coloured those who have two names in green.
popdata %>%
group_by(DiffName) %>%
summarise(mean=mean(population), median=median(population))
## # A tibble: 2 x 3
## DiffName mean median
## <lgl> <dbl> <dbl>
## 1 FALSE 14744.06 11383
## 2 TRUE 42510.78 24701
#Plotting average town size of small and larger towns
ggplot()+
geom_histogram(data=popdata, aes(x=population), fill="grey", alpha=0.6)+
geom_histogram(data=subset(popdata, DiffName==TRUE), aes(x=population), fill="cadetblue4", alpha=1)+
scale_x_log10()+
labs(x= "Population", y="Number of towns", title="Size of towns with two official names amongst all towns in Belgium")
I took a shortcut to define our cities: the 10% highest populated towns.
#10% largest towns and cities in Belgium
quantile(popdata$population, probs = seq(from = 0, to = 1, by = .1))
## 0% 10% 20% 30% 40% 50% 60% 70%
## 89.0 4372.2 6341.8 8308.4 10268.4 12123.0 14649.6 18473.6
## 80% 90% 100%
## 23259.6 34189.8 520504.0
#Proportion of Cities with different names
popdata %>%
filter(population > 34000) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 60 27 33 0.45 0.55
#Adding a reason column
reason_city <- popdata %>%
filter(population > 34000) %>%
filter(Region != "Brussels agglomeration") %>%
filter(DiffName) %>%
mutate(Reason = "City")
After World War I, the peace treaty of Versailles listed the annexation of 9 German towns into Belgium as compensation. They make up our third language region as German is still their main language today. Given that German and Dutch are both German langauges and have a lot of similarities it would make sense that the Flemish would refer to the German town names, while the French have changed some of them.
#Listing the German communes and the two additional towns with german facilities
germanspeaking <- c("Eupen", "Kelmis", "Lontzen", "Raeren", "Amel", "Büllingen", "Burg-Reuland", "Bütgenbach",
"Sankt Vith", "Malmedy", "Weismes")
#Proportion of Cities with different names
popdata %>%
filter(TownNL %in% germanspeaking) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 11 5 6 0.45 0.55
#German towns with two official names
popdata %>%
filter(TownNL %in% germanspeaking) %>%
filter(DiffName==TRUE) %>%
print(n=nrow(.))
## # A tibble: 6 x 6
## TownNL TownFR DiffName population Region REFNIS
## <chr> <chr> <lgl> <dbl> <chr> <chr>
## 1 Kelmis La Calamine TRUE 10964 Wallonia 63040
## 2 Sankt Vith Saint-Vith TRUE 9661 Wallonia 63067
## 3 Weismes Waimes TRUE 7493 Wallonia 63080
## 4 Bütgenbach Butgenbach TRUE 5583 Wallonia 63013
## 5 Amel Amblève TRUE 5523 Wallonia 63001
## 6 Büllingen Bullange TRUE 5489 Wallonia 63012
#Adding a reason column
reason_german <- popdata %>%
filter(TownNL %in% germanspeaking) %>%
filter(DiffName) %>%
mutate(Reason = "German region")
Always a topic for debate in Belgium: the towns with official language facilities. These are towns that belong to one region but they have some degree of bilingual facilities (it’s complicated!).
#Listing all towns with language facilities
faciliteiten <- c("Bever", "Drogenbos", "Herstappe", "Kraainem", "Linkebeek", "Mesen", "Ronse",
"Sint-Genesius-Rode", "Spiere-Helkijn", "Voeren", "Wemmel", "Wezembeek-Oppem",
"Edingen", "Komen-Waasten", "Moeskroen", "Vloesberg")
#Proportion of Cities with different names
popdata %>%
filter(TownNL %in% faciliteiten) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 16 6 10 0.38 0.62
#Which towns have different names?
popdata %>%
filter(TownNL %in% faciliteiten) %>%
filter(DiffName==TRUE) %>%
print(n=nrow(.))
## # A tibble: 10 x 6
## TownNL TownFR DiffName population Region
## <chr> <chr> <lgl> <dbl> <chr>
## 1 Moeskroen Mouscron TRUE 57773 Wallonia
## 2 Ronse Renaix TRUE 26092 Flanders
## 3 Sint-Genesius-Rode Rhode-Saint-Genèse TRUE 18231 Flanders
## 4 Komen-Waasten Comines-Warneton TRUE 18102 Wallonia
## 5 Edingen Enghien TRUE 13563 Wallonia
## 6 Voeren Fourons TRUE 4129 Flanders
## 7 Vloesberg Flobecq TRUE 3426 Wallonia
## 8 Bever Biévène TRUE 2160 Flanders
## 9 Spiere-Helkijn Espierres-Helchin TRUE 2142 Flanders
## 10 Mesen Messines TRUE 1049 Flanders
## # ... with 1 more variables: REFNIS <chr>
#Adding a reason column
reason_facilities <- popdata %>%
filter(TownNL %in% faciliteiten) %>%
filter(DiffName) %>%
anti_join(reason_city) %>%
mutate(Reason = "Language facilities")
To summarize, there are a few reasons why towns have different official names * They are part of a bilingual region (Brussels) * They are a larger city * They are part of the German region * They have langauge facilities * They are close to the language border
Along the way I added additional reason columns, which I now want to merge into the mapdata:
#Creating a reason column for all other towns with two names
reason_other <- popdata %>%
filter(DiffName) %>%
anti_join(reason_city) %>%
anti_join(reason_BXL) %>%
anti_join(reason_german) %>%
anti_join(reason_facilities) %>%
mutate(Reason = "Other")
#Merging reasons into one dataframe
reason <- bind_rows(reason_BXL, reason_city, reason_german, reason_facilities, reason_other)
#Searching for duplicates before join
reason %>%
group_by(REFNIS) %>%
filter(n() > 1)
## # A tibble: 0 x 7
## # Groups: REFNIS [0]
## # ... with 7 variables: TownNL <chr>, TownFR <chr>, DiffName <lgl>,
## # population <dbl>, Region <chr>, REFNIS <chr>, Reason <chr>
#Joining reasons into the main dataframe
popdata_reason <- left_join(popdata, reason)
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).